Skip to content

feat(julia): add Julia language support with vendored tree-sitter grammar#290

Open
hexu2 wants to merge 1 commit into
colbymchenry:mainfrom
hexu2:feat/julia-language-support
Open

feat(julia): add Julia language support with vendored tree-sitter grammar#290
hexu2 wants to merge 1 commit into
colbymchenry:mainfrom
hexu2:feat/julia-language-support

Conversation

@hexu2
Copy link
Copy Markdown

@hexu2 hexu2 commented May 22, 2026

Summary

Adds first-class Julia (.jl) indexing to CodeGraph using the tree-sitter-julia grammar.

This PR extends and supersedes the groundwork in #244 by @kongdd (happy to co-author), with additional handling for common Julia AST shapes seen in real-world packages (including Julia 1.11):

  • Structs without block wrappersstruct_definition may use typed_expression children instead of a body / block; extractStruct resolves the body via resolveBody for consistency with interface extraction.
  • Functions without block wrappers — many files use a flat statement list directly under function_definition. Handled in juliaExtractor.visitNode: the first signature-shaped child (signature / call_expression / typed_expression / where_expression) is treated as the declaration head, and the remaining children are walked via ctx.visitFunctionBody so nested calls (e.g. out_neighbors, topological_sort) are still indexed. resolveBody deliberately returns null for these so the core visitFunctionBody cannot re-enter extractFunction and recurse.
  • One-line definitions — e.g. f(x) = expr via assignment handling (not covered in add support for Julia #244).
  • module nodesmodule_definition mapped to module kind so members are namespace-scoped in qualified_name.
  • macrocall_expression — added to callTypes.
  • Vendored WASMtree-sitter-julia.wasm committed under src/extraction/wasm/ (the grammar is not published in tree-sitter-wasms@0.1.11; npm-only loading would fail in CI and offline installs). Same vendoring pattern as lua / luau / pascal / scala.

Also adds the optional getName hook on LanguageExtractor (as proposed in #244) and wires it through extractName.

Test plan

  • npm run build
  • npm test -- --run -t "Julia Extraction"11 / 11 tests pass, covering: language detection, top-level functions, signatures, macros, structs without block, abstract types, modules, imports / using, one-line f(x) = expr, and call extraction inside flat bodies.
  • Full npm test — extraction & installer-targets suites pass.
  • Real-world: codegraph index over a Julia workspace (~120 .jl files) — julia shows up under "Files by Language", with function / struct / module / import nodes and calls / imports edges in the index.

Notes for maintainers

  • tree-sitter-julia is a devDependency only for rebuilding WASM; runtime uses the committed src/extraction/wasm/tree-sitter-julia.wasm binary.
  • Rebased onto current main; no merge conflicts.
  • New getName hook on LanguageExtractor is optional and used only by Julia today; existing extractors are unaffected.

@hexu2 hexu2 force-pushed the feat/julia-language-support branch from ae7ae0e to 7a3c565 Compare May 22, 2026 01:59
@kongdd
Copy link
Copy Markdown

kongdd commented May 24, 2026

may need to solve conflicts with main branch

@kongdd
Copy link
Copy Markdown

kongdd commented May 24, 2026

我提议,我们可以先维护一个Julia能用,codegraph能安装、能正常使用。github cli能自动执行测试的版本。
这样主线是否接收、以及接收的效率,不影响我们使用。

@hexu2 hexu2 force-pushed the feat/julia-language-support branch from 7a3c565 to 13a4c11 Compare May 25, 2026 01:25
…mmar

Add tree-sitter-julia extraction with vendored WASM and registry wiring.
Builds on colbymchenry#244 (@kongdd) with support for common Julia 1.11 AST shapes:

- getName hook for symbols whose names are not direct nameField children
- Structs without block wrappers (typed_expression children) via resolveBody
- Functions without block wrappers — handled in visitNode: first
  signature-shaped child becomes the declaration head, remaining children
  are walked via visitFunctionBody so calls are still indexed.
  resolveBody returns null for these so the core walker cannot re-enter
  extractFunction and recurse.
- One-line assignment functions (e.g. f(x) = expr)
- module_definition mapped to module kind for namespace-scoped qualified names
- macrocall_expression added to callTypes
- Vendored tree-sitter-julia.wasm (not published in tree-sitter-wasms);
  same pattern as lua/luau/pascal/scala
- Register .jl in grammars, types, and the language extractor map

Co-authored-by: Cursor <cursoragent@cursor.com>
@hexu2 hexu2 force-pushed the feat/julia-language-support branch from 13a4c11 to 6ee89c4 Compare May 25, 2026 01:28
@hexu2
Copy link
Copy Markdown
Author

hexu2 commented May 25, 2026

@kongdd Thanks for the heads-up! I've rebased onto current main (no merge conflicts) and force-pushed a single squashed commit.

While rebasing I also caught a regression that was masked before: when a Julia function has no block wrapper, the previous resolveBody returned the whole function_definition, which made the core visitFunctionBody re-enter extractFunction recursively — so within the same file only the first function was kept (and its calls were dropped). The fix moves flat-body handling into juliaExtractor.visitNode: the first signature-shaped child becomes the declaration head, and the remaining children are walked via ctx.visitFunctionBody so nested calls (out_neighbors, topological_sort, etc.) are indexed correctly. resolveBody now returns null for functions/macros to make the recursion impossible by construction.

Verification:

  • npm run build
  • npm test -- --run -t "Julia Extraction"11 / 11 pass
  • Full npm test — extraction + installer-targets suites green
  • Real-world: indexed a Julia workspace with ~120 .jl files; julia shows up under "Files by Language" with function / struct / module / import nodes and calls / imports edges

完全同意你之前的提议 —— 先维护一个 Julia 可用、安装/CI 都通过的版本,主线接收节奏不影响我们使用。如果想合到你那边作为基线,我也愿意把当前分支同步过去,或者一起 co-author。

PR description updated to match the current implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants